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Foreword 

This study is part of the PTDC/SAU-ESA/101228/2008 project - Forensic 
Entomology: Morphometric and Molecular databank (mtDNA) to identify species 
(Diptera and Coleoptera) with forensic interest - funded by Fundagdo para a 
Ciencia e Tecnologia (FCT). 

This thesis was designed based on the preparation of two papers to be 
submitted to international journals. However, since this is an academic work (to 
get a master degree) it was considered important to devise a general introduction 
and a final consideration. 

The articles are presented according to the standards of the journals for 
which these will be submitted: 

• Journal of Forensic Sciences (American Academy of Forensic Sciences) 

- Cytochrome c oxidase I effectiveness as a marker for insects’ identification; 

- Forensic relevant insects’ identification through GenBank and BOLD 
databases. 
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Resumo 

A Entomologia Forense e a ciencia que aplica o conhecimento sobre os 
insectos, e outros artropodes, em procedimentos juridico-legais. 0 primeiro passo 
a ser tornado em Entomologia Forense e a identificagao das especies, 
normalmente realizada atraves de caracteres morfometricos, utilizando chaves 
dicotomicas de identificagao; no entanto, a observagao da morfologia e um 
metodo, por vezes, demorado e inconclusivo. For outro lado, os metodos 
moleculares fornecem uma identificagao rapida e precisa, possibilitam a 
identificagao dos insectos em qualquer estadio de desenvolvimento, incluindo os 
estadios larvares, e podem ser utilizados independentemente das condigoes de 
preservagao dos exemplares. 

Na verdade, as metodologias para identificagao molecular de especies tern 
sofrido uma grande evolugao e, actualmente, o DNA barcoding e considerado 
uma ferramenta muito util na identificagao de especies. Este conceito baseia-se 
na amplificagao e sequenciagao de um pequeno segmento de DNA - conhecido 
como sequencia barcode - de uma regiao padrao do genoma. Varios estudos 
sugerem o uso da sequencia que codifica para a subunidade I da proteina 
citocromo c oxidase (COI) como o marcador de DNA adequado para o DNA 
barcoding. A identificagao de especies atraves desta nova ferramenta baseia-se 
na amplificagao e sequenciagao deste fragmento; uma vez obtida a informagao da 
sequencia do especime-alvo e possivel compara-la com sequencias de referencia, 
isto e, sequencias de especies previamente identificadas, ja existentes numa 
biblioteca digital. 

A identificagao de especies atraves do DNA barcoding implica, numa 
analise filogenetica, que cada especie surja como um grupo monofiletico. Apesar, 
deste novo conceito se basear no uso de metodos de construgao de arvores 
filogeneticas, nao deve ser interpretado como tal, uma vez que a sequencia 
barcode nao apresenta, frequentemente, um sinal filogenetico suficiente para 
determinar relagoes evolutivas. Um outro criterio para a delineagao de especies 
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assenta em valores limite para as divergencias nucleotidicas intra e 
interespecificas. Um dos limites e de 3% (valor estabelecido para insectos), em 
que valores de divergencia intra-especifica abaixo deste limite determinam uma 
unica especie e valores de divergencia interespecificas acima, apontam para 
diferentes especies. 0 outro limite, que surge como uma actualizagao do primeiro, 
sugere que a media da divergencia nucleotidica entre especies pertencentes ao 
mesmo genero deve ser 10 vezes superior a media da divergencia intra-especifica 
encontrada para as mesmas especies. A observagao destes tres criterios permite, 
assim, determinar se estamos perante a mesma especie ou especies diferentes 

0 Barcode of Life Data System (BOLD) e um software responsavel pela 
gestao de dados obtidos atraves da ferramenta DNA barcoding. 0 sistema de 
identificagao do BOLD e a unidade funcional para a identificagao de especimes 
no qual, a sequencia obtida e submetida e comparada com as sequencias 
referencia, a semelhanga dos sistemas utilizados noutros bancos de dados para a 
identificagao de especies (por exemplo, a base de dados GenBank do National 
Center for Biotechnology Information, NCBI). 

A existencia de evidencias entomologicas pode ser de grande importancia 
para cases forenses. De facto, estas podem fornecer informagoes importantes que 
poderao orientar o decorrer da investigagao criminal. 

A criagao e implementagao de uma Base de Dados de especies de insectos 
e um passo importante para a Entomologia Forense. Com efeito, qualquer pais 
que possua um servigo de Entomologia Forense eficaz e cientificamente bem 
suportado deve ter um conhecimento abrangente da diversidade de insectos. 0 
uso do DNA barcoding sugere a sua utilidade na identificagao de especies de 
insectos encontrados em cenarios forense. Apesar das vantagens cientificas e 
pragmaticas existentes no conhecimento da diversidade de insectos em qualquer 
regiao do globo, a utilizagao deste marcador genetico em bancos de dados exige 
que seja determinada a sua eficacia na distingao entre especies. 

Este estudo foi desenvolvido e escrito com vista a preparagao de dois 
artigos cientificos que serao submetidos a revistas internacionais da 
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especialidade. Neste sentido, a presente dissertagao esta dividida em quatro 
partes. 

0 Capitulo 1 refere-se a Introdugao geral que assenta na revisao 
bibliografica e estado de arte sobre a Entomologia Forense e do DNA barcoding, e 
que da o fundamento ao trabalho desenvolvido. 

0 Capitulo 2 diz respeito ao primeiro artigo cientifico que tern como titulo 
“Cytochrome c oxidase I effectiveness as a marker for insects’ identification”. Este 
capitulo tern como principals objectives determinar as sequencias 
correspondentes a regiao do gene COI, de cada especime, utilizada para efeitos 
de DNA barcoding, isto e, um fragmento de 658 pares de bases correspondente a 
regiao inicial do gene COI e, testar a eficacia deste para a identificagao de 
especies de dipteros com relevancia forense. Aqui foram utilizados 52 individuos 
pertencentes a quatro especies de Diptera, Calliphora vicina (Robineau- 
Desvoidy, 1830), Calliphora vomitoria (Linnaeus, 1758), Lucilia caesar 
(Linnaeus, 1758) e Musca autumnalis (De Geer, 1776). Estes especimes foram 
recolhidos e morfologicamente identificados num estudo desenvolvido 
anteriormente. A amplificagao, com primers universais, e a sequenciagao da 
regiao em estudo foram facilmente obtidas. Este facto e muito vantajoso em 
situagoes que necessitam de uma maior rapidez na analise das amostras, como 
acontece em situagoes de contexto forense. 0 estudo filogenetico permitiu 
identificar cada especie como um grupo monofiletico. For sua vez, a analise das 
divergencias nucleotidicas intra e interespecificas, para as duas especies do 
mesmo genero, permitiram confirmar que, para os dois limites utilizados para a 
identificagao de especies atraves do DNA barcoding, estas sao especies 
diferentes. Estes resultados mostram a eficacia do COI como marcador genetico 
para a discriminagao de especies. 

0 Capitulo 3 refere-se ao segundo artigo cientifico, e tern como titulo 
“Forensic relevant insects’ identification through GenBank and BOLD databases”. 
0 principal objective deste trabalho foi determinar a capacidade destas bases de 
dados publicas para a identificagao de especies de insectos com interesse forense. 
Alem disso, os dados foram tambem utilizados para determinar a eficacia do 
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marcador COL Como anteriormente, todas as amostras foram facilmente 
amplificadas e sequenciadas. Os resultados mostraram quo foi possivel 
identificar 67.6% dos individuos, ao mvel da especie atraves da base de dados 
GenBank. Atraves da base de dados BOLD foi possivel identificar 58.8% dos 
especimes, tambem ao nivel da especie. No total foram identificados 49 
especimes pertencentes a 11 especies diferentes: Eudasyphora cyanella (Meigen, 
1826), Lucilia caesar (Linnaeus, 1758), Pollenia rudis (Fabricius, 1794), Musca 
autumnalis (De Geer, 1776), Phaonia subventa (Harris, 1780), Phaonia 
tuguriorum (Scopoli, 1763), Helina impucta (Fallen, 1825), Helina evecta (Harris, 
1780), Helina reversio (Harris, 1780), Hydrotaea dentipes (Fabricius, 1805) e 
Hydrotaea armipes (Fallen, 1825). As sequencias correspondentes a estas 
amostras foram utilizadas, posteriormente, para a analise filogenetica e para o 
calculo das divergencias nucleotidicas intra e interespecificas. Na analise 
filogenetica foi possivel observar situagoes de monofilia para todas as especies. 
No que diz respeito a avaliagao das divergencias nucleotidicas entre especies do 
mesmo genero, os valores limite possibilitaram a discriminagao de cada especie. 
Em suma, estes resultados corroboraram a eficacia do gene COI para 
identificagao de especies. 

For fim, o Capitulo 4 destina-se as Consideragoes Finais, onde e referida 
a importancia deste trabalho para a aplicagao do marcador COI em bases de 
dados, utilizadas nao so em situagoes de contexto forense mas tambem para o 
conhecimento global da diversidade biologica bem como a sua importancia para a 
contribuigao de uma base de dados da biodiversidade nacional. 

Palavras-chave: Entomologia Forense; DNA barcoding; Citocromo c Oxidase I; 
Diptera; Base de Dados; Barcode of Life Data System; GenBank. 
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Abstract 

Forensic entomology is the science, which applies knowledge of insects 
(and other arthropods) to civil proceedings and criminal trials. Indeed, the 
existence of entomological evidences can be of great importance to forensic cases, 
because they can provide relevant information to delineate the course of the 
investigation; however, the species-level identification of specimens found on 
corpse is extremely important. Use of cytochrome c oxidase I (COI) as molecular 
marker for DNA barcoding project suggests that this approach could be very 
useful in forensic scene, where fast and accurate tools for species identification 
are essential. Molecular database implementation for insects’ species is a very 
important step for the evolution of forensic entomology. Indeed, any country that 
wishes to have an effective and scientifically well supported forensic entomology 
service must have a comprehensive knowledge of insects’ diversity.The main 
goals of this study are to provide evidence of the COI performance to be used as 
an effective, reliable and fast tool for an identification database and to determine 
what extent Barcode of Life Data System (BOLD) and GenBank databases are 
able, at that time, to identify insects’ species with relevance. The COI fragment 
proposed for DNA barcode was sequenced and nucleotide sequence divergence 
within and between species and phylogenetic analysis were performed. In the 
two studies, COI allows observation of species discrimination as strongly 
supported monophyletic groups and intra and interspecific nucleotide 
divergences confirm the potential of COI in species delimitation. The results also 
showed that GenBank allowed to identify more sequences than BOLD, although 
the two databases have shown a good ability to identify insects’ species. 

Keywords: Forensic Entomology; Cytochrome c Oxidase I; DNA barcoding; 
Database; Barcode of Life Data System; GenBank; Diptera. 
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Chapter l - General Introduction 


Entomology is derived from the Greek word entomon (insect) + logos 
(word, reason) meaning the study of insects (Gupta and Setia, 2004). Thus, 
forensic entomology is the science, which applies knowledge of insects (and other 
arthropods) to civil proceedings and criminal trials (Turchetto and Vanin, 2004). 

According to Byrd (2006), forensic entomology commonly comprises three 
general areas: medicolegal or medicocriminal, urban, and stored product pests. 
The medicolegal area investigates the necrophagous feeding insects that colonise 
human corpses with legal purposes. The urban forensic entomology works with 
the insects that affect man and his immediate environment. Both the civil and 
criminal components of this area are involved, since the urban pests feed on both 
the living and the dead. Their mandibles can cause damages leading to economic 
problems. Besides, they can produce marks and wounds on the skin that may be 
misinterpreted as prior abuse. The stored products area deals with food and 
drink contamination by insects. The forensic entomology helps on determination 
of the insects’ species involved, answers if their presence is accidental or 
intentional, and establishes if the levels of insects are allowable (Byrd, 2006). 
According to Anderson (1999), the wildlife forensic entomology should also be 
considered. This area assumes particular relevance in surveillance and 
protection of mistreatment of wild animals in captivity. 


1.1 Retrospective 

The first documented forensic entomology case is reported by the Chinese 
lawyer and death investigator Sung Tzu, in the 13th century. In his book, “Hsi 
yuan chi lu” (one possible translation is “The Washing Away of Wrongs”) Sung 
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Tzu describes, possibly, the first case in which insects helped to solve a crime 
(Benecke, 2001; Amendt et al., 2004; Gupta and Setia, 2004). 

During medieval times, beyond the medical and legal experts, sculptors, 
painters and poets have closely observed the decomposition of human bodies and 
were made realistic and detailed illustrations of corpses containing maggots 
(Benecke, 2001) (Figure 1). 



Figure 1. Illustration of corpses containing maggots: (left) "Dance of the Death" (15th century); (right) 
grave of Robert Touse (exact time of making unknown) (From: Benecke, 2001). 


In 1855, the first modern forensic entomology case appeared, reported by 
Bergeret. He used forensic entomology to estimate the postmortem interval 
(PMI) (Benecke, 2001). Later, Yovanovich and Megnin were the first forensic 
examiners who tried to evaluate insect succession on corpses, establishing 
properly the science of forensic entomology (Amendt et al., 2004) and, in 1894, 
Megnin published his most important book “La Faune des Cadavres”, in which 
he explained his theory of eight successional insect’s waves for freely exposed 
corpses (Benecke, 2001) and mentioned that on buried bodies insects came in two 
waves (Gupta and Setia, 2004). He also described the morphological features of 
various classes of insects that helped in their identification (Gupta and Setia, 
2004). 
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Since the beginning of the 20th century, the interest in matter increased 
as well as the knowledge on the properties of insects. By now, forensic 
entomology has been accepted in many countries as an important tool and many 
studies have been made on the subject. 


1.2 Postmortem changes of the human body 

After death most animal bodies undergoes a process of decomposition 
which results in the gradual dissolution of the tissues (autolysis) into gases, 
liquids and salts caused essentially by proteolytic and other enzymes released by 
bacteria (Gordon et al, 1988). Alternatively, an abnormal transformation of the 
corpse can occur depending on environmental conditions (maceration in 
immersed bodies, mummification in a dry environment) (Campobasso et al., 
2001 ). 

During the decomposition, the body temperature decreases, phenomena 
known as algor mortis, and the skin color becomes red (livor mortis or lividity). 
Another sign of death is the stiffening of the muscle fibers due to the breakdown 
of glycogen and the accumulation of lactic acid {rigor mortis). Later skin 
slippage, the loosening of the epidermis from the underlying dermis occurs and 
hair and nails are easily detached. The production of a large quantity of gases 
during putrefaction causes physical distortion of the body, and a green coloration 
shows up the superficial blood vessels, the gastrointestinal region and those 
portions of the body where livor mortis was most marked. All these changes 
occur within the 72-96 h after death. Finally, when the temperature of the body 
is at the same level as the environment and following the initial putrefaction, no 
reliable estimation of the postmortem interval (PMI) is possible (Amendt et al., 
2004). Following this initial stage, also known as fresh stage, the body suffers 
others transformations according to more four main stages: putrefaction, dark 
putrefaction, butyric fermentation and dry stage (Bornemissza, 1957). 
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The postmortem decay rate can depend on intrinsic or extrinsic factors. 
The intrinsic factors comprise age and constitution of the body, cause of death, 
and integrity of corpse (Campobasso et al, 2001). On the other hand, extrinsic 
factors like the ambient temperature, the humidity of the atmosphere, the 
movement of air or other medium, the state of hydration on the tissues, the 
nature of the medium, the nature of the soil and depth really influence the rate 
of decomposition (Gordon et al, 1988). The existence of clothes can also slow 
down postmortem body cooling and favor the onset of the putrefaction process 
and also the animal predators, from arthropods to mammals, can have a 
predominant role in the breakdown of the corpse (Campobasso et al., 2001). 


1.3 Insects and the corpse 

A cadaver constitutes a dynamic system that shelters and supports a rich 
community, of which arthropods form an important part, not only because they 
consume decomposing tissue but also because they speed up the decomposition 
processes (Arnaldos et al., 2004). The colonization of a corpse by arthropods, and 
more precisely by insects, persists during the evolution of decomposition from the 
first few minutes after death until the bones resemble the bleached white stage 
(Haskell et al., 1997). 

1.3.1 Role of arthropods in decomposition 

The cadaver can be colonized by a variable number of arthropods but 
only few species actively participate in cadaver breakdown directly accelerating 
the rate of decay (Campobasso et al., 2001). 

Each group of arthropods plays a different role in different stages of 
decomposition of organic matter. Its development in the cadaver is affected by 
several factors, temperature being the most important, affecting the rate of 
development and may cause diapause (the complete suspension of development) 
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(Myskowiak and Doums, 2002). Under favourable conditions, certain species of 
flies may lay their eggs or deposit larvae on exposed bodies. In the case of the 
egg-laying species, after a variable period, depending mainly upon the 
atmospheric temperature, the eggs hatch and the larvae feed upon the tissues, 
being loosed a considerable amount of tissue after death (Gordon et al., 1988). 

Colonizers species are selectively attracted by the decomposing status of 
the carrion. These species form complex communities within necrophagous 
species (also known as scavengers) which feed only on decomposing tissues, 
predators or parasites of the necrophagous species feeding on other insects or 
arthropods, omnivorous species feeding both on decomposing remains and 
associated arthropods, and other species which use the corpse as an extension of 
their habitat and part of their environment (Amendt et al., 2004). In general, 
necrophagous, necrophilous and omnivorous are the most important groups in 
forensic studies. Within these, the necrophagous species that appear in a 
predictable sequence are the most important for forensic investigations 
(Arnaldos et al., 2004). 

1.3.2 Forensic evidence 

The study of the order of appearance of arthropods on a corpse can 
provide conclusive evidence in a forensic case work (Arnaldos et al., 2001). 
Indeed, the collection of arthropods found in a corpse has been shown to be very 
useful for estimating the time since death (Amendt et al., 2000; Turchetto et al., 
2001; Wells et al., 2001; Arnaldos et al., 2004; Saigusa et al., 2009). 

According to Marchenko (2001), the scientific base of using entomological 
data in forensic entomology comprises: (1) existence of necrophagous insects in 
nature, which use cadaver tissues and pass the major part of their life cycle on 
cadavers; (2) relative constancy and specificity of cadaver entomofauna in a 
particular geographical region comprising widely spread predominating species; 
(3) compliance of species composition of cadaver entomofauna to the degree of its 
tissue decomposition and to its location; (4) seasonal alterations of predominant 
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necrophagous insect species; (5) beginning of insects activity in spring and its 
end in autumn as a result of transition to diapauses condition due to changes in 
temperature and light-time interval, the values thereof being dependent on 
geographical region and being specific for each species; (6) regulation of number 
of generations per vegetative period and of species life cycle duration by strictly 
definitive species-particular thermal parameters; (7) long preservation of insects 
chitin cuticles in nature. 

1.3.3 Species with forensic relevance 

For the purposes of forensic entomology, the two groups of insects most 
important are Diptera (flies) and Coleoptera (beetles) (Haskell et al., 1997). 
Depending on the biogeographical region and ecological habitat, different species 
of insects are involved in the decay of a corpse; but generally, the first insects of 
the succession to colonize a cadaver belong to Diptera order. 

In the Diptera, the blowflies species are the most important in forensic 
cases. These are the bright metallic blue and green ‘T)ottle” flies. Because of their 
huge number, the blowflies were the major vector in the degradation of the 
cadaver. They are mostly diurnal and usually rest at night (Chaubert et al., 
2003). Within the Diptera order, families like Calliphoridae, Sarcophagidae and 
Muscidae have a great relevance as forensic indicators (Arnaldos et al., 2001). 
Calliphoridae and Muscidae were found to be the first to colonise the cadaver as 
soon as 2-3 h after exposure, followed by Sarcophagidae. The preferred 
oviposition sites were generally eyes, nasal openings, mouth, ears, and towards 
the end of the fresh stage the genitals (scrotum and vagina). According with the 
external temperatures hatching took place in a period ranging from 6 to 40 h 
after oviposition, larval development between 3 and 10 days and pupariation 6- 
18 days before emergence of adults. Fly activity continued until the dry stage of 
decomposition (Campobasso et al., 2001). 

The Coleoptera appearance increase both in number of species and in 
number of individuals in the later stages of body decomposition. Some Coleoptera 


7 


DNA BAECODING AND FORENSIC ENTOMOLOGY: A MOLECULAR APPROACH FOR DiPTERA SPECIES’ IDENTIFICATION 

Chapter 1 - General Introduction 


species colonize corpses as necrophagous insects while others are predators of 
Diptera larvae. Beetle activity (mainly Dermestidae) is essentially associated 
with the most advanced stages of the degradation process causing the drying out 
of semi-liquid soft tissues (Campobasso et al., 2001). In case of Dermestidae, the 
larval stage, which are the real indicator of time since death, are characteristic of 
the most advanced stages of decomposition, even though adults specimens are 
known to appear in corpses from a very early time (Arnaldos et al., 2004). 

Other orders of insects known to frequent decomposing carrion include 
Hymenoptera (bees, wasps, ants), Lepidoptera (butterflies and moths), 
Hemiptera (true bugs), Dictyoptera (cockroaches), and Acari (mites) of the class 
Arachnida (spiders, ticks and mites). Of these groups, species of Hymenoptera 
are the most common. Wasps and ants are the main predators of fly eggs and 
larvae, while bees feed occasionally on fluids. Butterflies and moths have been 
observed to feed off of seepage from the carcasses, while bugs have been seen 
probing into the carrion, feeding in the underlying tissues. The cockroaches are 
usually found to cause superficial feeding artifacts on the surface of the skin of 
the corpse. They also may be liable for chewing off the eyebrows and eyelashes. 
In the order Acari, certain mite species are found to be associated with 
decomposing human remains. However, because they are very small, they are 
overlooked as evidence. These arthropods appear when remains are in advanced 
decay and drying, and they only are detected because they form aggregates and 
appear to be mold or piles of sawdust (Haskell et al., 1997). 


1.4 Importance of Forensic Entomology 

Forensic entomology appears to provide answers to several questions 
that can be raised in a forensic case. 

Firstly, forensic entomology intend to establish the time of death, known 
as postmortem interval (PMI), or more precisely, how long a carrion has been 
exposed in the environment. Indeed, using medical techniques, such as the 
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measurement of body temperature or analyzing livor and rigor mortis, time since 
death can only be accurately measured for the first 2 or 3 days after death. In 
contrast, by calculating the age of immature insect stages feeding on a corpse 
and analyzing the necrophagous species present on cadaver, postmortem 
intervals from the first day to several weeks can be estimated (Hall and Amendt, 
2007). According to Hall and Haskell (cit by Haskell et al., 1997), the PMI can be 
determinate using two entomological methodologies. The first is based on a 
known insect species life cycle (particularly the blowflies’ life cycle) (Figure 2). 
The second method, proposed by Megnin and others workers, is based on insect 
successional waves evaluations, that is, the nature of insect fauna present on the 
corpse at any given time (Figure 3). 



Figure 2. Example of a typical blowfly cycle. (1) Oviposition: eggs white to yellow. (2) Eclosion: maggot 
emerges. (3) Larva I: length about 10 mm. (4) Larva II: length 20 mm. (a) food in crop. (5) Larva III: length 
45 mm, (a) hlood in crop; (h) internal skeleton for feeding. (6) Postfeeding larva III: (a) internal feature 
obscured. (7) Puparium: changes color with age, (a) early stage; (h) late stage. (8) Eclosion: adult fly 
emerges. (9) After hardening, adult male and female flies seek mates. (10) Following copulation, female 
completes egg development. (11) Female lays egg mass (oviposits) on carrion/corpse at moist sites. (12) 
Female lay several egg masses in her adult life (1 to 3 weeks) (From: Haskell et al., 1997). 
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INSECT FAMILY 

STAGES OF DECOMPOSITION 

FRESH 

BLOATED 

DECAY 

DRY 

CALLIPHORIDAE: (blow flies) 
MUSCIDAE: (muscid flies) 

SILPHIDAE; (ceiiion beetles) 
SARCOPHAGIDAE: (flesh flies) 
HISTERIDAE: (clown beetles) 
STAPHYLINIDAE: (rove beeUes) 
NITIDULIDAE: (sap beetles) 

CLERIDAE: (checkered beetles) 
DERMESTIDAE: (detmestid beetles) 
SCARABAEIDAE: (lamellicom beetles) 



































'Each stage of decomposition is given the same amount of space in this Ubie. 


- Indicates a small number of individuals present. 

Indicates a moderate number of individuals present. 
■■■ Indicates a large number of individuals present. 


Figure 3. Example of adult arthropods succession on human cadavers in east Tennessee (during spring and 
summer) (From: Rodriguez and Bass, 1983). 


Secondly, the ascertainment of postmortem transfer and, consequently, 
where was the initial location of the body, if it was hidden and where it was 
hidden can be made through the specimens’ collection in the corpse. This is 
possible because, despite the fact that some common species are relatively 
ubiquitous, the presence of others species found only in certain geographical 
areas and occurred in a relatively definable environment (indoor or outdoor; 
rural and urban; wet or dry environment) can suggest that body was moved after 
death (Haskell et al., 1997). Addicionaly, large accumulations of remnants 
(puparia of earlier generations of fly larvae, skins of beetle larvae, the bodies of 
dead insects and larvae solid excrements) left by insects occur when a 
decomposing body lies for a long period, and this can help to confirm that the 
body has lain undisturbed in situ for an extended time (Archer et al., 2005). In 
the same way, the presence of live maggots or remnants of insects in the absence 
of a dead body at a location is almost certain evidence that some kind of corpse 
has been removed from the scene (Campobasso and Introna, 2001). 

Forensic entomology is also used in diagnosis of poisoning. Indeed, when 
bodies are in a state of advanced decomposition or that are skeletonized the 
examination for toxicologically relevant substances may be difficult due to the 


10 




















DNA BAECODING AND FORENSIC ENTOMOLOGY: A MOLECULAR APPROACH FOR DiPTERA SPECIES’ IDENTIFICATION 

Chapter 1 - General Introduction 

lack of appropriate sources such as tissue, blood or urine (Amendt et al., 2004). 
On the other hand, maggots feeding on intoxicated tissues introduce into their 
own metabolism drugs and toxins (Campobasso and Introna, 2001) that will be 
deposited into fat bodies and the exoskeletal material (chitin) of the insect. These 
ingested drugs are sheltered into the chitin and remain in the specimen for an 
extended period of time (Haskell et al., 1997). Consequently, a thorough 
toxicological analysis of necrophagous larvae and remains from a corpse may be 
crucial to the correct determination of death (Campobasso and Introna, 2001). 
However, it is known that toxics modified the development rate of maggots and 
the use of insect life stage method in calculation of PMI must be careful to avoid 
errors in PMI estimation. 

Other aim of forensic entomology is the detection of negligence situations 
(Benecke and Lessig, 2001; Anderson and Huitson, 2004; Archer et al., 2005). 
The early colonization of living people and animals is known as myasis, and the 
occurrence of maggots in wounds or natural orifices may indicate negligence and 
can help to estimate how long this situation of neglect was verified. Although 
this advantage, these colonizers are of the same species found in early 
decomposition stage of corpses and this can lead to complications in estimation of 
PMI. 

Finally, other questions like the time of decapitation and/or 
dismemberment, the submersion interval, the identification of specific sites of 
injury on the body and postmortem artifacts (both, on the body and in the crime 
scene), the suspect association to crime scene, and sexual molestation can be 
answered through entomological investigation. 

These findings can then inform several stages of the criminal justice 
process: the initial scene investigation, the subsequent follow-up investigative 
process when evaluating suspects and witnesses, and the criminal trial. 
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1.5 DNA barcoding 

Accurate identification of an insect specimen is usually a crucial first 
step in a forensic entomological analysis. Closely related carrion species can 
substantially differ in growth rate, diapause response or ecological preferences. 
Species-diagnostic based on anatomical characters are not known for the 
immature stages of many forensically important insects and an existing key may 
be incomplete or difficult for non-specialists to use (Wells and Stevens, 2008), 
and the correct species determination is indispensable in forensic investigations. 

The identification of insects based on deoxyribonucleic acid (DNA) can be 
performed with immature insects or fragments of puparium and adult insects, 
and provide a much faster identification and thus facilitate the successful 
conclusion of a case (Harvey et al., 2003; Mazzanti et al, 2010). According to 
Amendt et al. (2004) polimerase chain reaction (PCR) amplification of suitable 
regions of the genome, sequence analysis of the amplicons obtained, and 
alignment of the data with reference sequences is the usual and recommended 
method. 

Today, the concept of DNA barcoding arises as a molecular approach to 
identify species. This concept is based on a DNA sequence that acts as a barcode 
specific for each species (Hebert et al, 2003). In this way, the DNA barcode is a 
short sequence of nucleotides taken from an appropriate part of an organism’s 
genome that is used to identify it at species level. 

Species identification by DNA barcoding is a sequencing-based 
technology. Once obtained the sequence information of the target specimen it is 
possible comparing this information to a sequence library from known species 
(Hajibabaei et al, 2007). Nowadays, several libraries of DNA sequences can be 
found. Some of these repositories are comprehensive and include sequences from 
several segments of DNA (e.g. GenBank), but others are restricted to a specific 
marker (e.g. BOLD) (see Chapter 3). 
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The key point for any taxonomic system is its ability to deliver accurate 
species identification and, according to Hebert et al. (2003), DNA barcoding 
accurately identified species in more than 95% of cases. 

1.5.1 Nuclear DNA versus Mitochondrial DNA 

Generally, the mitochondrial genome (mtDNA) of animals is a better 
target for analysis than the nuclear genome because of its high copy number, 
lack of introns, its limited exposure to recombination and its haploid mode of 
inheritance (Hebert et al., 2003) and therefore, have an increased chance of 
generating species-specific markers (Harvey et al., 2003). In animals, mtDNA 
occurs as a single double-helical circular molecule containing 13 protein-coding 
genes, 2 ribosomal genes, a non-protein coding control region, and several 
transference RNAs. Each mitochondrion contains several such circular molecules 
and, therefore, several complete sets of mitochondrial genes. Furthermore, each 
cell has several mitochondria. Thus, when sample tissue is limited, the 
mitochondrion offers a relatively abundant source of DNA (Waugh, 2007). 
Consequently, these features make the mtDNA clearly advantageous to forensic 
studies where material may be only fragments or poorly preserved. 

1.5.2 Cytochrome c oxidase subunit I (COI) as DNA barcoding marker 

The efficacy of DNA barcoding depends on selection of a suitable segment 
of DNA. Indeed, its mutation rate must be slow enough so that intraspecific 
variation is minimised but sufficiently rapid to highlight interspecific variation, 
it must be relatively easy to collect, and should have as few insertions or 
deletions as possible to facilitate sequence alignment (Hebert et al., 2003). 

In 2003, Herbert et al. published a study in which they suggest the use of 
cytochrome c oxidase I as the suitable DNA marker to DNA barcoding. 

Eukaryotic cytochrome c oxidase, the last enzyme of the mitochondrial 
respiratory chain, is highly conserved across species that employ oxidative 
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phosphorylation for metabolism and is a multimeric enzyme of dual genetic 
origin. The subunits I, II and III are large transmembrane proteins, highly 
hydrophobic, encoded in mitochondrial genome (Figure 4). The remaining small 
subunits that surround the core of the enzyme are encoded in the nuclear 
genome (Fontanesi et al, 2008). Cytochrome c oxidase subunit I (COI), the 
catalytic subunit of the enzyme, is predominantly imbedded in the membrane of 
the mitochondrial crista. The nucleotides of the gene that codes for it show 
sufficient variation to differentiate between species (Waugh, 2007). Indeed, 
Hebert et al. (2003) says that COI have two important advantages: (1) the 
universal primers for this gene are very robust, enabling recovery of its 5’ end 
from representatives of most, if not all, animal phyla and (2) COI appears to 
possess a greater range of phylogenetic signal than any other mitochondrial gene 
(the evolution of this gene is rapid enough to allow the discrimination of not only 
closely allied species, but also phylogeographic groups within a single species). 



Figure 4. Gene map of the D. yakuba mtDNA molecule (From: Clary and Wolstenholme, 1985). 


However, according to Frezal and Leblois (2008), the DNA barcoding 
shows some crucial pitfalls. First, the existence of under-described fraction of 
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biodiversity complicate the identification of unknown specimens, since the 
individuals chosen to represent each taxon in the reference database could not 
cover all of existing diversity in this taxon. Second, the inherent risks due to 
mitochondrial inheritance can lead to over- or underestimate sample divergence 
and render conclusions on species status unclear. Indeed, heteroplasmy (i.e. the 
presence of a mixture of more than one type of mitochondrial genome within a 
single individual), and maternally transmitted bacteria (e.g. Wolbachia, 
Whitworth et al, 2007) can cause misleading processes in identification. Third, 
nuclear mitochondrial pseudogenes (NUMTs), this is non-functional copies of 
mitochondrial DNA sequences translocated into the nuclear genome (Song et al., 
2008), could mimic mitochondrial copies of COI introducing ambiguity into the 
barcoding and lead to disturbances in specimens’ identification. Fourth, the rate 
of evolution in COI marker, since the evolution rate is not equal for all living 
species, can lead to a lack of resolving power. Finally, the intra-specific 
geographical structure can generate high rates of intra-specific divergence that 
can blur and distort species delineation. 

Despite these shortcomings, DNA barcoding may prove to be an efficient 
tool for rapid assessment of taxonomic diversity, especially in species groups that 
are otherwise difficult to study (Linares et al, 2009) and, consequently, could be 
very helpful in forensic entomology investigations (see Chapter 2). 


1.6 Framing in Master degree 

The difficulties in morphological identification of some insects and the 
possible association of these to a forensic context show the necessity of molecular 
identification of species found in these scenarios. 

The content of this dissertation intents to understand the importance of 
Forensic Biology, both in the areas of Molecular Biology and Genetics, and in 
Forensic Entomology either when applied to legal and criminal research. 
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Under the master's degree in Biologia Humana e Ambiente, this work 
comes as a contribution to cover the gap in forensic entomology in Portugal, 
particularly in the molecular systematic characteristic of insects. Moreover this 
will be the first step in the creation of the National Molecular Database of 
insects’ species with forensic relevance based on a new concept for species 
identification, the DNA barcoding. 


1.7 Main goals 

In Portugal, forensic entomology is still a very undeveloped area and this 
study appears to cover this gap. 

For purposes of this study, we will focus our attention in medicolegal and 
wildlife forensic entomology, because the involvement of insects in decomposition 
of cadavers. 

Thereby, the main goals of this study are: 

. determine the DNA barcoding sequences of some insects’ species 
(previously identified by morphological methods); 

. test the effectiveness of the COI for the identification; 

• evaluate if the databases that currently exist (e.g. GenBank from 
NCBI; BOLD from CBOL) are able to identify species with forensic relevance 
based on COI sequence; 

. contribute to the implementation of a National Molecular Database 
applicable for Portugal area. 

Despite these main objectives, this thesis aim the acquirement of 
qualification in laboratorial practice and in analysis of the results obtained 
during the laboratorial work. 
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Chapter 2 - Cytochrome c oxidase I effectiveness as a 

MARKER FOR INSECTS’ IDENTIFICATION 


Abstract 

The implementation of a molecular database of insects’ species is very 
important step for the evolution of forensic entomology. Indeed, any country that 
wishes to have an effective and scientifically well supported forensic entomology 
service must have a comprehensive knowledge of insects’ diversity. 

The widespread use of cytochrome c oxidase I (COI) as the ideal 
molecular marker for DNA barcoding project suggests that this approach could 
be very useful as well in forensic scene, where rapid, precise species 
identification tools are vital. Despite scientific and pragmatic advantages of 
knowing the diversity of insects with forensic interest through the globe, the 
implementation of such molecular database requires the establishment of its 
ability to distinguish different species in forensics too. 

Using four common fly species found to be forensically relevant 
(Calliphora vicina, Calliphora vomitoria, Lucilia caesar and Musca autumnalis), 
this study aimed to provide evidence of the COI performance to be used as an 
effective, reliable and fast tool for an identification database. 

The COI fragment proposed for DNA barcode was sequenced; then, 
nucleotide sequence divergence within and between species and phylogenetic 
analysis were performed. 

Phylogenetic analyses show all species as strongly supported 
monophyletic groups. The intraspecific divergence within Calliphora shows an 
average value of 0.24% and average of interspecific divergence percentage 
between these congeneric species was 4.9%. Highest interspecific divergence 
values occur between M. autumnalis and the other three species. In fact, this 
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species belongs to Mucidae while other three belongs to Calliphoridae, being 
phylogenetically more distant. 

According to our molecular data, this method appears to be an accurate 
and robust technique for identifying at least these most common fly species with 
forensic relevance. 

Keywords: forensic science; database; forensic entomology; Diptera; cytochrome 
c oxidase I; DNA barcoding. 


1. Introduction 

DNA barcoding is a new molecular tool useful in species discrimination, 
which uses a small DNA fragment - known as DNA barcode — from a 
standardized region of the genome (1). This fragment consists of a 658 bp string 
corresponding to nucleotide positions 1490-2198 from the 5- end of cytochrome c 
oxidase subunit I gene (COI) using Drosophila yakuba mitochondrial genome as 
a reference (2). 

Forensic entomology studies the interaction of insects and other 
arthropods with dead bodies, and like other forensic sciences, is used for legal 
purposes (3). Different insect species colonizing corpses have different biologies 
(life-cycle, ecological preferences, distribution, etc.) and, based on this, a forensic 
entomologist can provide answers for several questions in a crime scene: 
estimation of postmortem interval (PMI), postmortem transfer, diagnosis of 
poisoning, and neglect of living people (4). Since corpses’ colonization occurs by 
successive waves and colonization pattern changes regionally and seasonally, 
identifying which species colonize the corpse is the key for forensic entomologist 
work. Thus, identification of insects collected from a corpse must be precise; 
otherwise, erroneous developmental data application may result in an incorrect 
PMI estimation (3). 


23 


DNA BAECODING AND FORENSIC ENTOMOLOGY: A MOLECULAR APPROACH FOR DiPTEEA SPECIES’ IDENTIFICATION 

Chapter 2 - Cytochrome c oxidase I effectiveness as a marker for insects’ identification 


In this respect, species have been widely identified through the use of 
mostly morphological criteria. Morphological identification consists in 
anatomical character-based keys, only usable by few experts, to identify the 
adults (or larvae and pupae in some cases) to species level. However, for most 
groups, keys when available can be vague and the identification can become 
difficult and almost impossible. In addition, larval stage is the most usually 
found on corpses (5) and time-consuming rearing of this stage to adult for 
identification may delay criminal investigation or cause significant problems 
when rearing fails (6). Under these circumstances, species’ identification based 
on molecular analysis can appears as a more suitable way for unknown 
specimens’ identification. Compared with morphological identification, molecular 
data acquisition arises as a less time consuming methodology and can also be the 
only way to identify damaged organisms or fragments, very common in forensic 
scenarios (7,8). Furthermore, molecular identification can be the only way when 
there are no obvious means to match adults with immatures, and when 
morphological traits do not clearly discriminate species (9). 

Using DNA barcoding concept for insects’ species identification should be 
taken into account three main criteria for species delimitation: 

1) The use of a threshold value, to separate intraspecific from interspecific 
variation, the so-called “barcoding gap” (1,10). For example, in insects, 
genetic distance between different species almost always exceeds 3% (1); 

2) The second criterion comes as an update of the previous, and suggests 
that this threshold value should be ten times greater than the average of 
intraspecific nucleotide distance for different animal species (11); 

3) Finally, the monophyletic association of specimen within a species in a 
phylogenetic analysis is required for a successful species’ identification 
(12,13), that meaning each morphological species should appears in a 
single monophyletic lineage (14). However, in spite of this method uses a 
phylogenetic tree construction method, this should not be interpreted as 
phylogenies, since DNA barcodes do not frequently demonstrate 
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sufficient phylogenetic signal to determine evolutionary relationships 
(15). 

After sequencing, an unknown insect sequence can be compared with a 
library of barcode reference sequences obtained from specimens of known 
identity. If it matches with a high confidence level with a reference sequence, it 
can be assumed that the unknown specimen belongs to the reference taxon 
(species) or, at least, to the group with identical species, on the other hand, if the 
unknown sequence does not match with any within the database, new data can 
be recorded as a new haplotype or a geographical variant, or can suppose the 
unveiling of a new species (6,15). Finally, information can be crossed with prior 
knowledge regarding developmental stages of each species and ecological data, 
and allows determination of relevant aspects with medicolegal purposes, 
including PMI. 

However, before assuming the use of COI as a molecular tool in forensic 
entomology, it’s necessary to ascertain their suitability on insects’ species 
identification. In this way, several specimens of Diptera, Calliphora vicina 
(Robineau-Desvoidy, 1830), Calliphora vomitoria (Linnaeus, 1758), Lucilia 
caesar (Linnaeus, 1758), all belonging to Calliphoridae, and Musca autumnalis 
(De Geer, 1776), belonging to Muscidae, were sequenced with the intent to 
evaluate COI effectiveness for implementation of this DNA barcoding marker in 
databases for the identification of insect species with forensic interest. 


2. Materials and Methods 

2.1 Samples 

Insect specimens used in this work were obtained in a previous study 
(16). Samples were collected from mammalian carcasses air exposed, in Portugal 
central region (Serra da Estrela mountains) during the winter, in 2008. Insects 
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capture was held in pitfall and “Malaise” traps and specimens, subsequently, 
were stored individually in 70% ethanol. 

All samples were morphologically identified to species level by an expert 
entomologist. These identifications unveiled specimens of four Diptera species: 
Calliphora vicina (13 specimens), Calliphora vomitoria (12 specimens), Lucilia 
caesar (8 specimens) and Musca autumnalis (19 specimens). 

2.2 DNA extraction 

DNA was extracted from 2-3 legs of each adult fly using E.Z.N.A.® Insect 
DNA Isolation Kit (Omega Bio-Tek, USA) following manufacturer’s protocol with 
an overnight incubation step. To maximize final yield of DNA, 45 pL of Elution 
Buffer, preheated to 60 °C - 70 °C, was added and left to incubate for 30 - 50 
minutes before centrifuging and collecting flow-through. Flow-through of the two 
elutions was collected in two different microtubes. Specimens’ remains were 
retained to check their identity if necessary. 

2.3 Polymerase chain reaction (PCR) 

COI barcoding region was amplified using primer pair ECO 1490 (5’ 
GGTCAACAAATCATAAAGATATTGG 3’) and HC02198 (5’ 

TAAACTTCAGGGTGACCAAAAAATCA 3’) (1,2). 

Each 25 pL PCR mixture contained IX Colorless GoTaq® Flexi Reaction 
Buffer (Promega, USA), 100 pM of dNTPs (Fermentas, USA), 2 mM MgCh, 0.4 
pM of each primer, 0.32 pg of BSA, 0.02 U GoTaq® Flexi DNA Polymerase 
(Promega, USA), 4-5 pL of DNA extract, and water added to complete the 
volume. PCR temperature cycles were carried out in a GeneAmp® PCR System 
2700 thermocycler (Applied Biosystems, USA) and consisted of an initial 
denaturation step at 94 °C for 1 minute, followed by 5 cycles of 94 °C for 30 
seconds, 45 °C for 1 minute, and 72 °C for 1 minute, and 35 cycles of 94 °C for 1 
minute, 50 °C for 1 minute and 30 seconds, and 72 °C for 1 minute. The last cycle 
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was followed by 5 minutes at 72 °C to complete any partially synthesized strands 
(adapted from (1)). Amplified products were stored at 4 °C in the original PCR 
mix. All PCR products checked for bands in a 1.5% agarose electrophoresis gel 
stained with RedSafe (iNtRON Biotechnology, Korea) through UV 
transillumination. 

PCR products were purified with SureClean (Bioline, UK), according to 
manufacturer’s instructions, and were stored at -20 °C. 

2.4 Sequencing 

DNA was sequenced in both forward and reverse directions for all 
specimens using the same primers used in amplification. Sequencing reactions 
were performed on purified PCR products with the BigDye® Terminator v3.1 
Sequencing Kit (Applied Biosystems, USA), using a GeneAmp® PCR System 
2700 thermocycler. Sequencing reactions conditions consist on an inicial 
denaturation step at 96 °C for 1 minute, followed by 25 cycles of 10 seconds at 96 
°C, 5 seconds at 50 °C, and 4 minutes at 60 °C. Then, each reaction (10 pL) was 
purified, transferring whole product to a clean 1.5 mL tube with 1 pL of 3 M 
sodium acetate, pH 4.6 and 25 pL of absolute ethanol. Mixture was then 
incubated in ice for 30 minutes and centrifuged at maximum speed for 25 
minutes. Supernatant was discarded, 300 pL of 70% ethanol was added to the 
pellet, and tubes were centrifuged for another 15 minutes. This last step was 
repeated once, after which supernatant was discarded completely and samples 
air-dried away from light. 

Sequencing products were then analyzed using ABI PRISM 310 Genetic 
Analyzer (Applied Biosystems, USA). 

When this step wasn’t possible to undertaken in our laboratory, samples 
were sent away for sequencing in a sequencing company (Macrogen Inc., Korea). 
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2.5 Sequence analysis 

Sequence chromatograms obtained were edited and differences between 
forward and reverse sequences were resolved using Sequencher® v4.0.5 software 
(Gene Codes Corp., USA). Before analysis, all sequences were identified with 
GenBank BLASTn search engine (17) to confirm morphological identification. 
Additional COI sequence of Hypoderma lineatum (Viller, 1789) mitochondrial 
genome (accession number NC_013932) was obtained from public DNA database 
GenBank (18) to be used as outgroup in all analyses. 

Sequences obtained in this study were aligned using ClustalX v2.0.12 
(19), and BioEdit Sequence Alignment Editor v7.0.5.3 (20) was used to prepare 
the alignment file for posterior analyses. This file was then converted to .NEXUS 
format with Concatenator vl.1.0 software (21) to be used in sequence divergence 
and phylogenetic analyses. 

Optimal model of nucleotide substitution for the data was determined 
using Modeltest v3.7 (22) performed in PAUP* v4.0bl0 (23) according to Akaike 
information criterion (AIC). General time-reversible with gamma distribution 
shape parameter (GTR+G) model was shown as the most suitable for data 
analysis. 

Phylogenetic analyses were carried out in PAUP* software using 
Maximum Parsimony (MP), Neighbor-Joining (NJ) and Maximum Likelihood 
(ML) methods, and in MrBayes v3.1.2 (24) for Bayesian analysis. 

MP analysis was conducted using the heuristic search procedure (Tree 
Bisection and Reconnection algorithm, TBR) with a maxtree setting of 100 trees 
to find the most parsimonious trees. Bootstrap values of MP analysis (1000 
replicates) were obtained under the heuristic search procedure. 

A NJ tree was constructed using GTR+G model and 1000 bootstrap 
replicates were used to calculate support for nodes. 

For ML analysis GTR+G model was also used with 1000 bootstrap 
replicates and 1 replicate for tree base construction. 
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Bayesian analysis was carried out using Monte Carlo Markov Chain 
method (MCMC) implemented in MrBayes (25). This Bayesian inference analysis 
was conducted using one cold and three hot chains, and GTR+G model, was 
choosed by MrModeltest v2.3 (26) as the best model for this analysis (according 
to AIC). During 1.500.000 generations, sampling was made every 100 
generations and, to evaluate when stationary had been reached, likelihood scores 
from every 100 generations was plotted. From plots, it appeared that burn-in 
phase was completed by 30.000 generations. 

To visualize tree different appearances was used TreeViewX version 0.5.1 
software (27). 

To study intra versus interspecific variability, uncorrected (p-distance) 
and corrected (Maximum Likelihood model) distances were calculated under in 
PAUP*, for COI fragment of 658 bp. 


3. Results 

A 658 bp fragment of mitochondrial COI gene was successfully amplified 
and sequenced for 52 different fly species. 

Identification of all sequences, in the GenBank database, showed 
incongruences in morphological and molecular identifications of the Musca 
autumnalis. If on one hand these were previously identified as Musca domestica, 
our Blast analysis places them as M. autumnalis. 

Aligning all sequences did not show any insertion or deletion. Data 
revealed 150 variable positions, of which 109 are parsimoniously informative. 

3.1 Species identification 

ML tree representing mitochondrial genetic differentiation of C. vicina, 
C. vomitoria, L. caesar and M. autumnalis species, based upon COI data, is 
shown in Figure 5. This tree is topologically identical to trees obtained using NJ, 
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MP and Bayesian methods. Phylogenetic support for individual species nodes 
was high (>99%) across all four methods, despite minor differences in overall 
topology. 

Hypoderma lineatum (Diptera, Oestridae), used as outgroup, was clearly 
separated from Muscidae and Calliphoridae families in all analyses (Figures Al- 
A4, Appendix A). These two families are themselves distinct and appear 
monophyletic. Bootstrap values to these two families were 100% to Muscidae in 
all analyses and greater than 87.5% to Calliphoridae in NJ and MP analyses, 
despite ML analysis showed weak support (only 59.7% bootstrap). Calliphorid 
species were correctly assigned to sub-families Calliphorinae (C. vicina and C. 
vomitoria) and Luciliinae (L. caesar). The two species in Calliphora genus were 
grouped with high bootstrap support (>96.5% to C. vicina and >94.7% to C. 
vomitoria) and both species are clearly distint. Both specimens of M. autumnalis 
and L. caesar formed single clusters with 100% support in all analyses. Within 
each clade there is some variation, although this is not strongly supported by 
bootstrap values (<95%). Only L. caesar3 and L. caesar4 formed a group with 
bootstrap value greater than 96.8% (Figures A1-A4, Appendix A). 

3.2 Intraspecific variation 

Distance matrix (Table 1), based on the analysed 658 bp, revealed the 
percentage of nucleotide divergence values within and between among taxa. 
Values for intraspecific divergence with uncorrected distances (p-distance) 
showed a minimum of 0% for all four species and maximum reached 0.7, 0.67, 
0.54 and 1.00% to M. autumnalis, C. vicina, C. vomitoria and L. caesar, 
respectively. Corrected distances (ML distances) revealed intraspecific 
divergence within four analyzed species range between 0 and 0.71% to M. 
autumnalis, 0 and 0.68% to C. vicina, 0 and 0.54% to C. vomitoria and between 0 
and 1.04% to L. caesar (Tables A1-A2, Appendix A). 
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Figure 5. Maximum likelihood phylogram of 53 cytochrome c oxidase I (COI) sequences from four Diptera 
species (Musca autumnalis, Calliphora vicina, Calliphora vomitoria and Lucilia caesar) and one outgroup 
(Hypoderma lineatum). Values on tree branches correspond to Neighbor-joining/Maximum 
parsimony/Maximum likelihood/Bayesian inference analyses and indicate support for nodes. M = Musca; C 
= Calliphora; L = Lucilia. 


3.3 Interspecific variation 

Table 1 shows COI nucleotide divergence level between species groups 
used in analyses. Percentages of interspecific variation vary from 4.87 to 19.51% 
(for corrected distances) and from 3.96 to 12.01% (for p-distance). 
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Table 1. Percentage of divergence values within and between Musca autumalis, Calliphora vicina, 
Calliphora vomitoria and Lucilia caesar species at cytochrome c oxidase I (COI) region. Uncorrected 
distances (p-distances) are shown on above the diagonal and corrected distances (maximum likelihood 
distances) are on below the diagonal. Intraspecific divergence values are shown on the bold diagonals. 


M. autumnalis 

C. vicina 

C. vomitoria 

L. caesar 


M. autumnalis C. vicina C. vomitoria L. caesar 



In both cases, the smallest value corresponds to congeneric species, C. 
vicina and C. vomitoria', between L. caesar!C. vicina and L. caesar!C. vomitoria 
values are lower than between M. autumnalis and each of three other species; 
and highest value was found between C. vicina and M. autumnalis. 


4. Discussion 

The purpose of this study was to evaluate whether COI barcode provides 
sufficient resolution to identify different species of relevant Diptera found in 
forensic scenarios. 

According to the DNA barcode Consortium criteria, a species 
identification requires monophyletic association of each species in a phylogeny 
(12). Here, we performed a phylogenetic analysis using four statistical methods, 
NJ, MP, ML and Bayesian inference which delivered each species as a 
monophyletic group, with strong bootstrap support (Figure 1). The high support 
values for each species node show the COI marker potential to be used in species 
discrimination, which is the fundamental premise of the DNA barcoding project. 
Although the COI barcode region, by itself seems do not be enough to deliver a 
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strong phylogenetic signal, phytogenies or resolve taxonomic associations, it 
seems to hold enough ability to clearly distinguish these four forensic relevant 
species. 

The existence of a threshold value to discriminate species is another 
criterion used in DNA barcoding approaches. This criterion can be based on a 3% 
value for threshold or in a lOx or greater among versus within species nucleotide 
distances. In this study, intraspecific divergence within Calliphora species at 
COI region shows an average value of 0.24% (0.23% for uncorrected distances). 
According to lOx criterion this should correspond to a maximum sequence 
divergence of 2.4% (or 2.3%) as a threshold. In both cases, 2.4% and 3% 
thresholds, congeneric species can be distinguished, because average of 
interspecific divergence percentage (4.9% or 4.0%, in uncorrected distances) is 
greater than these two threshold values. 

Additionally, it is possible to observe that the higher value of 
intraspecific variation correspond to L. caesar (Table 1). This observation 
confirms the apparent variation observed (with high bootstrap value) in the clade 
of this species (Figures A1-A4, Appendix A). Regarding the interspecific 
variation, lower values of divergence are observed between two congeneric 
species (C. vicina and C. vomitoria). Since they belong to the same genus, they 
are phylogenetically closest and have higher genetic similarities. Similarly, 
highest interspecific divergence values occur between M. autumnalis and the 
other three species. Because they belong to different families (M. autumnalis 
belongs to Muscidae; Calliphora spp. and L. caesar to Calliphoridae), these 
species are phylogenetically more distant. 


5. Conclusions 

The main aim of this study was to evaluate COI effectiveness as a 
marker for the correct identification of forensically relevant insects’ species. Our 
results suggest that this COI region can be suitable for forensic relevant insects’ 
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species identification, namely, the most common flies present. In agreement with 
the DNA barcoding initiative, our data shows that the use of thresholds (1,11) 
and monophyletic situation of species (12) allows a correct species identification. 

Additionally, COI proved straightforward in amplification and 
sequencing. This advantage facilitates rapid generation of an unknown specimen 
sequence and subsequent identification. This much strengthens the use of this 
region as a molecular tool in forensic entomology studies and other situations 
featuring Diptera of applied importance. 
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Abstract 

The existence of entomological evidences can be of great importance to 
forensic cases. Indeed, this can provide relevant information to delineate the 
course of the investigation; therefore, the species-level identification of 
specimens found on corpse is extremely important. The Barcode of Life Data 
System (BOLD) is a new tool for management of DNA barcoding data. The 
identification system of BOLD is the functional unit for identification of 
specimens by pasting their sequence and compared this with sequence reference 
from known specimens, like used in others databases (e.g. GenBank from NCBI). 
In this way, this study arises to determine to what extent these databases are 
able to identify insects’ species with forensic relevance. Additionally, the 
effectiveness of COI marker to purposes of DNA barcoding was evaluate. The 
results showed that GenBank allowed to identify more sequences than BOLD, 
and also proved the potential of COI as barcode sequence. 

Keywords: forensic science; forensic entomology; database; Barcode of Life Data 
system; DNA barcoding; GenBank. 


1. Introduction 

A death body is a large food source for a range of organisms and supports 
a large and quickly changing fauna as it decomposes (1). Insects are generally 
the first organisms to colonize the corpse and they have been used as indicators 
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to determine postmortem interval (PMI). For forensic entomology purposes, its 
identification at species-level is mandatory. 

The molecular genotyping methods could benefit the indispensable 
identification of insects’ species in forensic cases. In fact, the disadvantages of 
the morphological identification process can be opposed by the speed and 
simplicity of molecular analysis, and make this the best method for forensic 
relevant species’ identification. 

In 2003, Hebert and colleagues suggest the existence of a universal 
sequence of DNA to identify species. This sequence, known as the barcode 
sequence, is the pillar for a new concept already widely spread: the DNA 
barcoding (2). These authors also propose a 658-bp mitochondrial genome region 
- the cytochrome c oxidase subunit I (COI) gene — as the primary barcode 
sequence for members of animal kingdom. 

The idea of a standardized molecular identification system emerged 
progressively and revealed that the creation of an organization responsible by 
management of the DNA barcoding data would be essential. Indeed, the 
Consortium for the Barcode of Life (CBOL) is an international initiative that 
supports the development of DNA barcoding and coordinates the collection of 
DNA barcodes. The volume of information already existing soon after showed the 
necessity to build a worldwide reference database for the molecular identification 
of all eukaryotic species (3,4). However, that database to be a complete barcode 
library for the animal kingdom will have to be about 100 million records (3). In 
this way, CBOL initiate the construction of a new database with emphasis in 
DNA barcode sequences, the Barcode of Life Data System (BOLD) - 
www.barcodinglife.org. BOLD is a bioinformatics platform which aids the 
acquisition, storage, analysis and publication of DNA barcode data (3), and is a 
freely available resource for the DNA barcoding community. Unlike other well- 
known sequence depositories (e.g. GenBank from NCBI), BOLD has an 
interactive interface where deposited sequences can be revised and 
taxonomically reassigned (5). 
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The Identification System of this platform (BOLD-IDS) allows matching 
a DNA barcode sequence of an unknown specimen with an assembly of reference 
libraries of barcode sequences for known species. In this way, it’s possible to 
know which species a problem-specimen belongs to. However, the recovery of 
species by this database could not be enough for all species discrimination. 
Indeed, in September 2010, the total available DNA barcode sequences were at 
789 488 sequences corresponding to 75 646 species (6), a number much lower 
than the 100 milli on records previously mentioned. 

In this way, this study arises to determine what extent the GenBank and 
BOLD databases are able to identify insects’ species with forensic relevance. 
Additionally, we also intend to demonstrate the effectiveness of COI marker in 
insects’ species identification. 


2. Materials and Methods 

2.1 Samples 

The 68 samples (Table Bl, Appendix B) included in this study were 
obtained from two previous studies (7,8). The samples were collected from 
vertebrate carcasses (air exposed) in Serra da Estrela Mountain (Portugal) and 
Oeiras (Portugal) regions between December 2007 and July 2008. The 
entomological material was captured with pitfall, “Malaise” and “Schoenly” 
traps. Then, the material was sorted, identified and stored individually in 70% 
ethanol. 

The specimens collected were identified only at family-level because of 
morphological identification difficulties. 
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2.2 DNA extraction 

DNA extraction was performed using 2 or 3 adult legs, depending on 
specimen size. Total genomic DNA was extracted using the E.Z.N.A.® Insect 
DNA Isolation Kit (Omega Bio-Tek, USA). In the first step of the procedure, 
samples were break down with a pestle without liquid nitrogen and, the 
following steps were performed according to manufacturer’s protocol. However, 
the elution of DNA was slightly modified to ensure maximum yield, with two 
matrix incubations using 40 pL of Elution Buffer, preheated to 60 °C - 70 °C, 
during 30 - 50 minutes and each elution was made to a different microtube. 

For purposes of DNA barcoding, some part of the specimens remains 
were preserved for replication of experiment if necessary. 

2.3 Amplification 

Initial amplification of a 658 bp 5’-end fragment of the mitochondrial COI 
gene was carried out using the primer pair ECO 1490 (5’- 

GGTCAACAAATCATAAAGATATTGG-3’) and HC02198 (5’- 

TAAACTTCAGGGTGACCAAAAAATCA-3’) (2). 

The PGR mixtures were made for a total volume of 25 pL and consisted 
in IX Colorless GoTaq® Flexi Reaction Buffer (Promega, USA), 100 pM of dNTPs 
(Fermentas, USA), 2 mM MgCU, 0.4 pM of each primer, 0.32 pg of BSA, 0.02 U 
GoTaq® Flexi DNA Polymerase (Promega, USA), 4 pL of DNA, and water added 
to complete the final volume. Failed amplifications were repeated under the 
same conditions with 5 pL of genomic DNA. 

PGR amplifications were performed in a GeneAmp® PGR System 2700 
thermocycler (Applied Biosystems, USA), using the following conditions: 94 °C 
for 1 minute, followed by 5 cycles of 94 °C for 30 seconds, 45 °C for 1 minute, and 
72 °C for 1 minute, 35 cycles of 94 °C for 1 minute, 50 °C for 1 minute and 30 
seconds, and 72 °C for 1 minute, and a final elongation for 5 minutes at 72 °C 
followed by holding at 4 °C. For some specimens amplification, the temperature 
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of annealing proved to be problematic and therefore optimization of the 
annealing temperature was obtained and used to amplify those individuals. In 
those cases, the PCR conditions consisted in an initial denaturation step for 1 
minute at 94 °C, 94 °C for 1 min, 54 °C for 1 minute, and 72 °C for 1 minute for a 
total of 40 cycles, and a final elongation step for 5 minutes at 72 °C (9). 

The PCR amplicons were visualized in an agarose gel electrophoresis 
(1.5%), stained with RedSafe (iNtRON Biotechnology, Korea) and under UV 
transillumination. 

2.4 Sequencing 

Before sequencing, the PCR amplicons were purified with SureClean 
(Bioline, UK), according to manufacturer’s instructions but with longer times of 
incubation and centrifugation, and stored at -20 °C. 

DNA sequencing was bi-directional for all specimens. The primers 
combination used in this step were the same used in PCR amplification. 
Sequencing reactions were performed using BigDye® Terminator v3.1 
Sequencing Kit (Applied Biosystems, USA) according to the manufaturer’s 
instructions. The cycle sequencing was performed in a GeneAmp® PCR System 
2700 thermocycler and consist in an inicial denaturation step at 96 °C for 1 
minute, followed by 25 cycles of 10 seconds at 96 °C, 5 seconds at 50 °C, and 4 
minutes at 60 °C. The purification of the reaction products were made according 
to the following steps: transferring of reaction product to a new 1.5 mL microtube 
containing a solution with 1 pL of 3 M sodium acetate (pH 4.6) and 25 pL of 
absolute ethanol; incuhate in ice during 30 minutes; centrifuge at maximum 
speed for 25 minutes; discard supernatant; add 300 pL of 70% ethanol to the 
pellet; centrifuge at maximum speed for 15 minutes; repeat the last three steps 
once more; discard supernatant; air-dried the samples kept in the dark. 

Sequencing chromatrograms were obtained with the ABI PRISM 310 
Genetic Analyzer (Applied Biosystems, USA). 
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2.5 Sequence analysis 

Sequencing chromatograms were edited and corrected with Sequencher® 
v4.0.5 software (Gene Codes Corp., USA). 

The specimens are molecularly identified by pasting their sequence 
record in both BLAST (Basic Local Alignment Search Tool) from NCBTs 
GenBank (10) and BOLD-IDS tool from BOLD Systems (6). In GenBank was 
used the nucleotide blast program for basic BLAST. The parameters used for 
BLAST were search in nucleotide collection database with MEGABLAST search, 
which is the more appropriate for comparing a query to closely related 
sequences. In BOLD the search was performed with BOLD-IDS tool for animal 
identification (that use the COI barcode) in “Species Level Barcode Records” 
search database and then, in “All Barcode Records on BOLD” search database 
when the first failed in identification. 

The sequences that allowed the species-level identification were used in 
the next step. The alignment of Diptera sequences was carried out using the 
ClustalX v2.0.12 (11) and the alignment file for analysis was prepared with 
BioEdit Sequence Alignment Editor v7.0.5.3 (12). To avoid interferences in the 
analyses due to lack of some nucleotides at the beginning and end of some 
sequences, the sequences ends were cut. Analysis was, therefore, made with 593 
bp from COI barcode fragment. To be used in sequence divergence and 
phylogenetic analysis the file was to be converted to .NEXUS format with 
Concatenator vl.1.0 program (13). The analyses was performed in PAUP* 
v4.0bl0 (14) and in MrBayes v3.1.2 (15) software. 

The optimal model of nucleotide sequence divergence for Neighbor¬ 
joining (NJ), Maximum Parsimony (MP) and Maximum Likelihood (ML) 
analyses, was determined using Modeltest v3.7 (16) and performed in PAUP*. 
According to Akaike information criterion (AIC) the General time-reversible + 
Proportion Invariant + Gamma distribution shape parameter (GTR+I+G) model 
was shown as the most suitable for the analysis. In Bayesian Inference analysis 
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the best model was chosen with MrModeltest v2.3 (17) and performed in 
MrBayes. 

A NJ tree was obtained using the optimal model and the support for 
nodes was calculated using 1000 bootstrap replicates. 

The most parsimonious tree was obtained with MP analysis using the 
heuristic search procedure (Tree Bisection and Reconnection algorithm, TBR) 
with a maxtree setting of 1000 trees. The bootstrap values were calculate with 
1000 replicates and were performed under the heuristic search procedure. 

For ML analysis GTR+I+G model was also used with 1000 bootstrap 
replicates and 10 replicates for tree base construction. 

For Bayesian inference analysis, the Monte Carlo Markov Chain method 
(MCMC) was used in MrBayes software (18). This analysis used one cold and 
three heated chains with GTR+I+G model (obtain as the best model according to 
AIC). The sampling was made every 100 generations during 1.500.000 
generations and the likelihood scores were recorded until the stationary be 
reached. These records shown that the burn-in phase was achieved by 30.000 
generations. 

The TreeViewX Version 0.5.1 software (19) was used to visualize the 
phylograms obtained from all analyses. 

Uncorrected (p-distance) and corrected (ML) distances were calculated 
using the PAUP*, according to the best model previously defined, to evaluate 
intra and interspecific variability for the 658 bp barcode region. 


3. Results 

A total of 68 sequences belonging to the initial portion of mitochondrial 
COI gene were successfully sequenced. 

The alignment of all sequences used in this study did not show any 
insertion or deletion. 
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3.1 GenBank and BOLD identifications 

This study represents an effort to show the functionality and utility of 
species identification with a DNA barcoding marker to successful discriminate 
between the insects species investigated. The capacity of species identification 
was estimated by comparing the 68 insects sequences, analyzed for COI marker, 
through GenBank and BOLD databases (Table Bl, Appendix B). 

The Figure 6 shows the percentage of specimens identified according to 
each database. With GenBank database 46 of 68 samples (67.6%) was 
successfully identified to species-level with a maximum identity value greater 
than 98%. The identification was unable to 19 samples and 3 samples revealed a 
confused identification (the search showed two possible outcomes to the same 
sequence). In BOLD search, 40 sequences (58.8% of total sequences) generate a 
correct identification at species-level and 17 sequences (25%) identified only at 
genus-level with a specimen similarity value greater than 99%, for both cases. 
From this search has resulted 8 sequences without identification and 3 samples 
with confuse identification (relatively to species-level identification. 

In total, 49 specimens were identified belonging to 11 diferent species: 
Eudasyphora cyanella (Meigen, 1826), Lucilia caesar (Linnaeus, 1758), Pollenia 
rudis (Fabricius, 1794), Musca autumnalis (De Geer, 1776), Phaonia subventa 
(Harris, 1780), Phaonia tuguriorum (Scopoli, 1763), Helina impucta (Fallen, 
1825), Helina evecta (Harris, 1780), Helina reversio (Harris, 1780), Hydrotaea 
dentipes (Fabricius, 1805) and Hydrotaea armipes (Fallen, 1825). 

3.2 Species identification 

The ML phylogram, showing bootstrap (from NJ, MP and ML analyses) and 
posterior probability (obtained in Bayesian inference analysis) values, was 
shown in Figure 7. NJ, MP, ML and Bayesian inference performed with 
sequences identified to species-level showed identical tree topology (Figures Bl- 
B4, Appendix B). Dermestes lardarius (Coleoptera order), used as outgroup, was 
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clearly separated from other specimens in all analyses. All analyses were 
congruent in recognizing 8 lineages on data set, almost all with high bootstrap 
support (in NJ, MP and ML) and posterior probability (in Bayesian 
inference).Only Hydrotaea dentipes showed lower bootstrap value (52.2%) in NJ 
analysis. Indeed, all species were resolved as reciprocally monophyletic groups. 



Species-level identification Genus4evel identification Confuse identification Without identification 


Figure 6. Percentage of specimens identified according to GenBank (dark blue bars) and BOLD (light blue 
bars) databases. 

beside some variation can be observed within some groups. Phylogenetic 
analyses also indicate that Phaonia subventa and Phaonia tuguriorum never 
appeared associated as congeneric species. Beside this, these two congeneric 
species ever were shown mixed with Helina evecta and Helina impucta. Helina 
evectalHelina reversio and Helina impucta!Helina reversio congeneric pairs never 
appears as associated at genus-level. In the other hand, Helina impucta!Helina 
evecta ever appear associated as congeneric species. Only NJ analysis showed 
association between congeneric species Hydrotaea dentipes and Hydrotaea 
armipes, with 100% bootstrap support. Lucilia caesar specimen was showed 
alone in all analyses. 

Table 2 compares the percentage of intraspecific and interspecific 
nucleotide divergences between congeneric species. Comparing these values it’s 
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possible to observe that, all intraspecific values are lower than 3% and the 
interspecific percentages are much higher than this value. In the other hand, all 
genera present an interspecific divergence percentage greater than its lOx 
intraspecific divergence percentage. 



Figure 7. Maximum likelihood phylogram of 69 cytochrome c oxidase I (COI) sequences from ten Diptera 
species and one outgroup (Dermestes lardarius). Values on tree branches correspond to Neighbor¬ 
joining/Maximum parsimony/Maximum likelihood/Bayesian inference and indicate support for nodes. 
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Table 2. Summary of intra and interspecific percentages of nucleotide divergences at cytochrome c oxidase I 
(p-distances and ML distances) of Phaonia, Helina and Hydrotaea genera. 


p-distances ML distances 


Genus 

Intraspecific 

divergence 

Interspecific 

divergence 

Intraspecific 

divergence 

Interspecific 

divergence 

Phaonia 

0.05 

14.55 

0.05 

16.81 

Helina 

0.63 

9.96 

0.64 

11.16 

Hydrotaea 

0.16 

8.07 

0.16 

8.83 


4. Discussion 

The comparison between the two molecular databases, GenBank and 
BOLD, reveals that GenBank database can identify more query sequences than 
BOLD database. This can be due to the fact that GenBank presents a most 
comprehensive database than BOLD (this is a more recent and specific 
database). Other fact can be associated with the BLAST search tools. These 
databases use different algorithms to calculate the similarity between reference 
and query sequences, and this can generate discrepancies in identification. In 
GenBank search, the 98% was used as limit in species identification because was 
observed that values below this delivery the query sequences to a different 
species than the species showed with values greater than 98%. In BOLD search, 
this value was 99%. According to this database species level match could not be 
made with values lower than 99%, returning only the information which is the 
nearest neighbor species. 

Comparing the performance of these four tree-building methods it is 
possible considered that all give similar results, recovering each species as a 
monophyletic group. Moreover, almost all bootstrap and posterior probability 
values were high showing the potential of this genetic marker to be used as a 
trustworthy marker in species identification. 
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However, some handicaps were observed in phylogenetic analysis. The 
non-association revealed between some congeneric species questions the power of 
this marker. Unfortunately, insufficient sequences of some species were 
available for a more detailed analysis, and the lack of some information in 
beginning and end of sequences may have interfered giving non-realistic results 
(considering that the species taxonomic level are well defined). The outgroup 
choice may also have interfered with the phylogenetic structure we would expect 
with this dataset. This may be a too distant outgroup to give rise to a tree more 
clearly defined. 

The mean of intraspecific and interspecific variation values were 
calculated only when two or more congeneric species exist. Keeping in attention 
the threshold values given for species discrimination, 3% (2) and lOx 
intraspecific divergence mean for each genus (20), the results showed that was 
possible distinguish the two species of Phaonia (Phaonia subventa and Phaonia 
tuguriorum), the two species of Hydrotaea {Hydrotaea dentipes and Hydrotaea 
armipes), and the three species of Helina (Helina impucta, Helina evecta and 
Helina reversion). Indeed, 0.05%, 0.64% and 0.16% of intraspecific variations 
means for Phaonia, Helina and Hydrotaea are lower than 3% threshold. In the 
other hand, reveal a threshold value of 0.5%, 6.4% and 1.6%, respectively (values 
calculated by lOx rule) and, in all cases, these values were lower than means of 
interspecific variation (16.81% for Phaonia, 11.16% for Helina and 8.83% for 
Hydrotaea). 


5. Conclusions 

The greatest approach to identify an unidentified sequence is to notice if 
that sequence already exists in a public database. The identification of Diptera’s 
species with forensic relevance showed to be of extremely importance for the 
investigation progress. As main aim, this study arises to determine what extent 
the GenBank and BOLD databases are able to identify these species. It was 
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possible to determine that these two databases allow identify a good percentage 
of species with forensic interest. However, any effort that contributes to a better 
understanding of biodiversity (in particular, with forensic interest; in general, for 
the biodiversity quantification) is of utmost importance, and the implementation 
of a new database comprehensive to this part of biological diversity, it’s a good 
step in direction to this knowledge. The establishment of a standard protocol 
may contribute to faster growth of this database. Consequently, here we also 
tested the effectiveness of COI barcode to be used in a standard protocol. The 
results support the potential of this genetic marker. However, more 
comprehensive studies should be developed, with more samples and others 
genetic markers, to overcome some difficulties encountered in this study. 
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Final Considerations 

This study was the first molecular approach to assessing the potential of 
DNA barcoding, especially of COI marker for its inclusion in a database of 
species of forensic interest. In addition, a database of these adds knowledge of 
biodiversity that can be used in other situations of ecological and conservationist 
context. Indeed, Portugal is a country with a very particular geoclimatic 
condition, and the survey of their biodiversity is extremely important because it 
can reveal some unknown endemic species, and thus contribute to the global 
understanding of biological diversity. 

In this study, morphological identification was overpass by this 
molecular approach in that morphological identification revealed a weakness in 
identification of some species. The weakness of the morphological methodologies 
refers mainly to the difficulty of observation of some morphological characters of 
identification which can lead to an incorrect identification. Moreover, this 
weakness reinforces the importance of molecular identification. 

The successful amplification and sequencing of COI marker showed its 
potential to be used in a standard protocol that quickly allows obtain the 
sequences and subsequent identification of species. The importance of using a 
well-supported protocol to be used as standard protocol in forensic investigation 
services will facilitate the course of the investigation both in the context of 
forensic medicine, whether in the context of attacks on wildlife destruction. 
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Figure Al. Neighbor-joining phylogram of 53 cytochrome c oxidase I (COI) sequences from four Diptera 
species (Musca autumnalis, Calliphora vicina, Calliphora vomitoria and Lucilia caesar) and one outgroup 
{Hypoderma lineatum). Bootstrap values indicate support for nodes among 1000 bootstrap replicates. M = 
Musca-, C = Calliphora-, L = Lucilia. 
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Figure A2. Maximum parsimony phylogram of heuristic search procedure (Tree Bisection and 
Reconnection algorithm, TBR) for 53 cytochrome c oxidase I (COI) sequences from four Diptera species 
(Musca autumnalis, Calliphora vicina, Calliphora vomitoria and Lucilia caesar) and one outgroup 
(Hypoderma lineatum). Bootstrap values indicate support for nodes among 1000 bootstrap replicates 
(heuristic search procedure). M = Musca', C = Calliphora', L = Lucilia. 
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Figure A3. Maximum likelihood phylogram of 53 cytochrome c oxidase I (COI) sequences from four Diptera 
species (Musca autumnalis, Calliphora vicina, Calliphora vomitoria and Lucilia caesar) and one outgroup 
{Hypoderma lineatum). Bootstrap values indicate support for nodes among 1000 bootstrap replicates. M = 
Musca-, C = Calliphora-, L = Lucilia. 
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Figure A4. Bayesian phylogeny of 53 cytochrome c oxidase I (COI) sequences from four Diptera species 
(Musca autumnalis, Calliphora vicina, Calliphora vomitoria and Lucilia caesar) and one outgroup 
(Hypoderma lineatum). Values on tree branches indicate posterior probability for nodes. M = Musca', C = 
Calliphora-, L = Lucilia. 
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Table Bl. Molecular identification of samples used in this study (68 specimens) with GenBank and BOLD 
databases. 




Molecular identification 


Sample 

GenBank 

Maximum 
identity (%) 

BOLD 

Specimen 
similarity (%) 

AnlA 

Oiptera sp. BOLD 

99 

Delia 

100 

An IB 

Diptera sp. BOLD 

99 

Delia 

100 

An 1C 

Diptera sp. BOLD 

100 

Delia 

100 

An2 

Diptera sp. BOLD 

99 

Delia 

100 

An3 

Diptera sp. BOLD 

100 

Delia 

100 

An4 

Diptera sp. BOLD 

100 

Delia 

100 

AnlBo 

Diptera sp. BOLD 

92 

? ? 

MusUIlA 

Helina evecta 

100 

Helina evecta 

100 

MusBIlB 

Helina evecta 

98 

Helina evecta 

98,9 

MusBIlC 

Helina evecta 

100 

Helina evecta 

100 

MusIVl 

Eudasyphora cyanella 

99 

Eudasyphora cyanella 

99,8 

MusIV2A 

Eudasyphora cyanella 

99 

Eudasyphora cyanella 

99,9 

MusIV2B 

Eudasyphora cyanella 

100 

Eudasyphora cyanella 

100 

MusIV2C 

Lucilia caesar 

100 

Lucilia caesar 

100 

Hy3A 

Eudasyphora cyanella 

99 

Eudasyphora cyanella 

99,7 

Hy3B 

Eudasyphora cyanella 

100 

Eudasyphora cyanella 

100 

Hy3C 

Eudasyphora cyanella 

100 

Eudasyphora cyanella 

100 

Hy3D 

Eudasyphora cyanella 

100 

Eudasyphora cyanella 

100 

Hy3E 

Eudasyphora cyanella 

100 

Eudasyphora cyanella 

100 

Hy3F 

Eudasyphora cyanella 

100 

Eudasyphora cyanella 

100 

Hy3G 

Eudasyphora cyanella 

100 

Eudasyphora cyanella 

99,8 

Hy3H 

Eudasyphora cyanella 

100 

Eudasyphora cyanella 

100 

Hy3I 

Eudasyphora cyanella 

100 

Eudasyphora cyanella 

100 

Hy3J 

Eudasyphora cyanella 

99 

Eudasyphora cyanella 

99,3 

Chi 

Lucilia illustris/Lucilia Caesar 

Lucilia illustris/Lucilia caesar 

Pol 

Pollenia rudis 

100 

Pollenia 

100 

Po2A 

Pollenia rudis 

99 

Pollenia 

99,36 

Po2B 

Pollenia rudis 

99 

Pollenia 

99,36 

Po3 

Pollenia rudis 

99 

Pollenia 

99,33 

Po4 

Pollenia rudis 

100 

Pollenia 

100 

Po5A 

Pollenia rudis 

100 

Pollenia 

100 

Po5B 

Diptera sp. BOLD 

90 

? ? 

Po5C 

Pollenia rudis 

99 

Pollenia 

99,17 

Po5E 

Pollenia rudis 

99 

Pollenia 

100 
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Table B1 (cont.). Molecular identification of samples used in this study (68 specimens) with GenBank and 
BOLD databases. 


Sample 

GenBank 

Molecular identiflcation 

bold 

identity 

Specimen 
similaritv t*ot 

Phi 

Phaonia subventa 

99 

Phaonia subventa 

99,17 

Ph2A 

Diptera sp. BOLD 

90 

? 

? 

Ph2B 

Oiptera sp. BOLD 

90 

? ? 

Ph2C 

Diptera sp. BOLD 

90 

? ? 

Ph3A 

Diptera sp. BOLD 

93 

Phaonia errans 

93,94 

Ph3B 

Diptera sp. BOLD 

96 

? ? 

Ph4 

Phaonia subventa 

99 

Phaonia subventa 

99,12 

Ph5 

Phaonia subventa 

99 

Phaonia subventa 

99,1 

Mull 

Musca autumnalis 

99 

Musca autumnalis 

99,7 

Mul3 

Musca autumnalis 

99 

Musca autumnalis 

99,7 

MuI4 

Musca autumnalis 

99 

Musca autumnalis 

100 

MuI5 

Musca autumnalis 

99 

Musca autumnalis 

99,7 

MulI7 

Muscina levida/Muscina assimilis 

Muscina levida/Muscina assimilis 

MuIIS 

Muscina levida/Muscina assimilis 

Muscina levida/Muscina assimilis 

MuII9A 

Diptera sp. BOLD 

98 

Helina reversio 

99 

MuII9B 

Diptera sp. BOLD 

98 

Helina 

98 

MuII9C 

Diptera sp. BOLD 

98 

Helina reversio 

99 

MuUIlO 

Phaonia tugoriorum 

99 

Phaonia tuguriorum 

100 

Mum 11 

Diptera sp. BOLD 

90 

? ? 

Munil2 

Helina impucta 

98 

Helina 

98 

MuBllSA 

Hydrotaea dentipes 

99 

Hydrotaea dentipes 

100 

MuBIlSB 

Hydrotaea dentipes 

99 

Hydrotaea dentipes 

100 

Munil4A 

Phaonia tugoriorum 

100 

Phaonia tuguriorum 

100 

Munil4C 

Phaonia tuguriorum 

100 

Phaonia tuguriorum 

100 

HylB 

Hydrotaea dentipes 

100 

Hydrotaea dentipes 

100 

HylC 

Hydrotaea dentipes 

100 

Hydrotaea dentipes 

100 

HylD 

Hydrotaea dentipes 

99 

Hydrotaea dentipes 

100 

HylE 

Hydrotaea dentipes 

100 

Hydrotaea dentipes 

100 

HylF 

Diptera sp. BOLD 

93 

Hydrotaea armipes 

100 

Hy2A 

Hydrotaea dentipes 

99 

Hydrotaea dentipes 

100 

Hy2B 

Hydrotaea dentipes 

100 

Hydrotaea dentipes 

100 

Hy2C 

Hydrotaea dentipes 

100 

Hydrotaea dentipes 

100 

Hy2D 

Hydrotaea dentipes 

100 

Hydrotaea dentipes 

100 

Hy2E 

Diptera sp. BOLD 

100 

Muscina 

100 
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Figure Bl. Neighbor-joining phylogram of 69 cytochrome c oxidase I (COI) sequences from ten Diptera 
species and one outgroup (Dermestes lardarius). Bootstrap values indicate support for 
nodes among 1000 bootstrap replicates. 
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Figure B2. Maximum parsimony phylogram of heuristic search procedure (Tree Bisection and 
Reconnection algorithm, TBR) for 69 cytochrome c oxidase I (COI) sequences from ten Diptera species and 
one outgroup (Dermestes lardarius). Bootstrap values indicate support for nodes among 1000 bootstrap 
replicates (heuristic search procedure). 
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Figure B3. Maximum likelihood phylogram of 53 cytochrome c oxidase I (COI) sequences from ten Diptera 
species and one outgroup (Dermestes lardarius). Bootstrap values indicate support for nodes among 1000 
bootstrap replicates. 
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Figure B4. Bayesian phylogeny of 69 cytochrome c oxidase I (COI) sequences from ten Diptera species and 
one outgroup (Hypoderma lineatum). Values on tree branches indicate posterior probability for nodes. 
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